Method for recognizing a keyword in speech

ABSTRACT

A keyword is recognized in spoken language by assuming a start of this keyword is at every sampling time. An attempt is then made to image this keyword onto a sequence of HMM statusses that represent the keyword. The best path in a presentation space is determined with the Viterbi algorithm; and a local confidence standard is employed instead of the emission probability used in the Viterbi algorithm. When a global confidence standard that is composed of local confidence standards downwardly crosses a lower barrier for the best Viterbi path, then the keyword is recognized; and the sampling time assumed as start of the keyword is confirmed.

BACKGROUND OF THE INVENTION

The invention is directed to a method for recognizing a keyword inspoken language.

A modelling of the complete spoken expression has hitherto always beenrequired in the recognition of a keyword in spoken language. The personskilled in the art is familiar with essentially two methods:

M. Weintraub, “Keyword-spotting using SRI's DECIPHER large-vocabularyspeech-recognition system”, Proc. IEEE ICASSP. Vol. 2, 1993, pp. 463-466discloses a method for the recognition of a keyword that employs aspeech recognition unit with a large vocabulary. The attempt is therebymade to completely recognize the spoken language. Subsequently, therecognized words are investigated for potentially existing keywords.This method is complex and affected with errors because of the largevocabulary and because of the problems in the modelling of spontaneousvocal expressions and noises, i.e. part of the voice signal that cannotbe unambiguously allocated to a word.

Another method employs specific filler models (also: garbage models) inorder to model parts of expressions that do not belong to the vocabularyof the keywords (what are referred to as OOV parts; OOV=out ofvocabulary). Such a speech recognition unit is described in H. Boulard,B. D'hoore and J.-M. Boite, “Optimizing recognition and rejectionperformance in wordspotting systems”, Proc. IEEE ICASSP, vol. 1, 1994,pages 373-376, and comprises the keywords as well as a filler model or aplurality of filler models. One difficulty is to design or train asuitable filler model that contrasts well with the modelled keywords,i.e. exhibits high discrimination with respect to the keyword models.

Further, hidden Markov models (HMMs) are known from L. R. Rabiner, B. H.Juang, “An Introduction to Hidden Markov Models”, IEEE ASSP Magazine,1986, pp. 4-16, or A. Hauenstein, “Optimierung von Algirthmen undEntwurf eines Prozessors für die automatische Spracherkennung”, DoctoralDissertation at the Chair for Integrated Circuits of the TechnicalUniversity, Munich, Jul. 19, 1993, pp. 13-35. It is also known fromRabiner et al or Hauenstein to define a best path with the Viterbialgorithm.

Hidden Markov models (HMMs) serve the purpose of describing discretestochastic processes (also called Markov processes). In the field ofspeech recognition, hidden Markov models serve, among other things, forbuilding up a word lexicon in which the word models constructed ofsub-units are listed. Formally, a hidden Markov model is described by:

λ=(A, B, π)  (0-1)

with a quadratic status transition matrix A that contains statustransition probabilities A_(ij):

 A={A _(ij)} with i,j=1, . . . ,N  (0-2)

and an emission matrix B that comprises emission probabilities B_(ik):

B={B _(ik)} with i=1, . . . ,N; k=1, . . . ,M  (0-3)

An n-dimensional vector π serves for initialization, an occurrenceprobability of the N statusses for the point in time t=1 defined:

π={π_(i) }=P(s(1)=s _(i))  (0-4)

In general,

P(s(t)=q _(t))  (0-5)

thereby indicates the probability that the Markov chain

s={s(1),s(2),s(3), . . . ,s(t), . . . }  (0-6)

is in status q_(t) at time t. The Markov chain s thereby comprises avalue range

s(t)ε{s ₁ ,s ₂ , . . . ,s _(N)}  (0-7)

whereby this value range contains a finite set of N statusses. Thestatus in which the Markov process is at time t is called q_(t).

The emission probability B_(ik) derives from the occurrence of aspecific symbol σ_(k) in the status s_(i) as

B _(ik) =P(σ_(k) |q _(t) =s _(i))  (0-8)

whereby a character stock Σ having the size M comprises symbols σ_(k)(with k=1 . . . M) according to

Σ={σ₁,σ₂, . . . ,σ_(M)}(0-9)

A status space of hidden Markov models derives in that every status ofthe hidden Markov model can have a predetermined set of successorstatusses: itself, the next status, the next but one status, etc. Thestatus space with all possible transitions is referred to as trellis.Given hidden Markov models of the order 1, a past lying more than onetime step in the past is irrelevant.

The Viterbi algorithm is based on the idea that, when one is locally onan optimum path in the status space (trellis), this is always acomponent part of a global optimum path. Due to the order 1 of thehidden Markov models, only the best predecessor of a status is to beconsidered, since the poorer predecessors have received a poorerevaluation in advance. This means that the optimum path can be soughtrecursively time step by time step beginning from the first point intime, in that all possible continuations of the path are identified foreach time step and only the best continuation is selected.

A respective modelling of the OOV parts is required given the methodsdescribed in Weintraub and Boulard et al. In the former instance ofWeintraub, the words of the expression must be explicitly present in thevocabulary of the recognition unit; in the latter instance of Boulard etal, all OOV words and OOV noises are presented by specific fillermodels.

SUMMARY OF THE INVENTION

The object of the invention is comprised in specifying a method thatenables the recognition of a keyword in spoken language, whereby theabove-described disadvantages are avoided.

According to the method of the invention for recognizing a keyword inspoken language, the keyword is represented by a sequence of statuses Wof hidden Markov models. The spoken language are sampled with apredetermined rate and a feature vector O_(t) is produced at everysampling time t for a voice signal from the spoken language belonging tothe sampling time t. The sequence O of feature vectors O_(t) are imagedonto the sequence of the statuses with a Viterbi algorithm, whereby alocal confidence standard is calculated on the basis of an emissionstandard at a status. With the Viterbi algorithm, a global confidencestandard is supplied. The keyword in the spoken language is recognizedwhen the following applies:

A method for recognizing a keyword in spoken language, comprising thesteps of representing the keyword by a sequence of statuses W of hiddenMarkov models; sampling the spoken language with a predetermined rateand providing a feature vector O_(t) at every sampling time t for avoice signal from the spoken language belonging to the sampling time t;imaging a sequence O of feature vectors O_(t) onto the sequence ofstatuses with a Viterbi algorithm, whereby a local confidence standardis calculated on the basis of an emission standard at a status; with theViterbi algorithm supplying a global confidence standard; recognizingthe keyword in the spoken language when the following applies C(W, O)<T,

where

C( ) denotes the confidence standard,

W denotes the keyword, presented as a sequence of statuses,

O denotes the sequence of feature vectors O_(t),

T denotes a predetermined threshold.

Otherwise, the keyword in the spoken language is not recognized.

One advantage of the invention is comprised that a keyword is recognizedwithin the spoken language without the expression having to be modelledoverall. As a result thereof, a clearly reduced expense derives in theimplementation and, accordingly, a higher-performance (faster) method.By employing the (global) confidence standard C as the underlyingdecoding principle, the acoustic modelling within the decoding procedureis limited to the keywords.

One development is that a new path through the status space of thehidden Markov models in a first status of the sequence of statusses Wbegins at each sampling time t. As a result thereof, it is assumed atevery sampling time that a beginning of a keyword is contained in thespoken language. On the basis of the confidence standard, featurevectors resulting from following sampling times are imaged onto thosestatusses of the keyword represented by hidden Markov models. A globalconfidence standard derives at the end of the imaging, i.e. at the endof the path, with reference whereto a decision is made as to whether thepresumed beginning of the keyword was in fact such a beginning. If yes,the keyword is recognized; otherwise, it is not recognized.

Within the scope of a development of the invention, the globalconfidence standard C is defined by

C=−log P(W|O)  (2)

and the corresponding local confidence standard c is defined by$\begin{matrix}{{c = {{- \log}\frac{P( {O_{t}{ s_{j} ) \cdot {P( s_{j} )}}} }{P( O_{t} )}}},} & (3)\end{matrix}$

whereby

s_(j) denotes a status of the sequence of statusses,

P(W|O) denotes a probability for the keyword under the condition of asequence of feature vectors O_(t),

P(O_(t)|s_(j)) denotes the emission probability,

P(s_(j)) denotes the probability for the status s_(j),

P(O_(t)) denotes the probability for the feature vector O^(t).

A suitable global confidence standard is characterized by the propertythat it provides information about the degree of a dependability withwhich a keyword is detected. In the negative logarithmic range, a smallvalue of the global confidence standard C expresses a highdependability.

Within the scope of an additional development, the confidence standard Cis defined by $\begin{matrix}{C = {{- \log}\frac{P( {O W )} }{P( {O \overset{\_}{W} )} }}} & (4)\end{matrix}$

and the corresponding local confidence standard is defined by$\begin{matrix}{c = {{- \log}\frac{P( {O_{t} s_{j} )} }{P( {O_{t} \overset{\_}{s_{j}} )} }}} & (5)\end{matrix}$

whereby

P(O|{overscore (W)}) denotes the probability for the sequence of featurevectors O_(t) under the condition that the keyword W does not arrive,

{overscore (s_(j))} denotes the counter-event for the status s_(j)(i.e.: not the status s_(j)).

The advantage of the illustrated confidence standards is comprised,among other things, in that they can be calculated, i.e. no priortraining and/or estimating is/are required.

The definition of the local confidence standards can be respectivelyderived from the definitions of the global confidence standards. Localconfidence standards enter into the calculation of the confidencestandard for a keyword at those points in time that coincide in timewith the expression of this keyword.

The local confidence standards can be calculated with the relationships$\begin{matrix}{{P( O_{t} )} = {\sum\limits_{k}\quad {P( {O_{t}{ s_{k} ) \cdot {P( s_{k} )}}\quad {and}} }}} & (6) \\{P( {{O_{t} \overset{\_}{s_{j}} )} = {\sum\limits_{k \neq j}\quad {P( {O_{t}{ s_{k} ) \cdot {P( s_{k} )}}} }}} } & (7)\end{matrix}$

Further, it is possible to determine P(O_(t)) or, respectively,P(O_(t)|{overscore (s_(j))}) with suitable approximation methods. Anexample of such an approximation method is the averaging of the n-bestemissions −log P(O_(t)|s_(j)) at every time t.

The decoding procedure is usually implemented with the assistance of theViterbi algorithm:${C_{t,s_{j}} = {\min\limits_{k}( {C_{{t - 1},s_{k}} + c_{t,s_{j}} + a_{kj}} )}},$

where

C_(t,sj) denotes the global, accumulated confidence standard at time tin the status s_(j),

C_(t−1,sk) denotes the global, accumulated confidence standard at thetime t−1 in the status s_(k),

c_(t,sj) denotes the local confidence standard at the time t in thestates s_(j),

a_(kj) denotes a transition penalty from the status S_(k) into thestatus S_(j).

Since no local confidence standards outside the time limits of thekeyword are required for a presentation of the global confidencestandard for a keyword, an acoustic modelling of the OOV parts can beforegone in the search for the keyword.

By applying the Viterbi algorithm with the possibility of starting a newpath in the first status of a keyword at every time t, whereby thekeyword is preferably subdivided into individual statusses of a hiddenMarkov model (HMM), the global confidence standard is optimized forevery keyword and, at the same time, the optimum starting time isdetermined (backtracking of the Viterbi algorithm).

For a predetermined time span, it is also expedient to also seek aminimum below the threshold T. Multiple recognition of a keyword withinthis predetermined time span is thereby avoided.

When there are keywords that are similar to one another in view of theirdescriptive form represented by the respective sequence of statusses,then it is useful to utilize a mechanism that, given recognition of akeyword, precludes that another keyword was partially contained in thespoken voice signal in the time span of the recognized keyword.

Exemplary embodiments of the invention are presented with reference tothe following Figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block circuit diagram of a method for recognizing a keywordin spoken language;

FIG. 2 is a sketch that illustrates the determination of a confidencestandard;

FIG. 3 is a sketch like FIG. 3 that shows the curve of an assumedconfidence standard over a predetermined time span.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a block circuit diagram of a method for recognizing akeyword in continuous speech.

In a step 101, the keyword is represented by a sequence of statusses W.Phoneme HMMs, each having three statusses, are preferably utilized forthis purpose (see Rabiner et al). In a next step 102, the continuousspeech is sampled, and a feature vector O_(t) for a voice signalbelonging to the sampling time t is produced from the continuous speechat every sampling time t. As components, the feature vector O_(t)thereby comprises a prescribed set of features that are characteristicof the voice signal at the sampling time t.

In a step 103, a sequence of feature vectors that have been acquiredfrom the voice signal for various sampling times t are imaged onto thesequence of statusses W. The Viterbi algorithm thereby represents oneimaging rule (see Rabiner et al). The emission probability −logP(O_(t)|s_(j)) utilized in the Viterbi algorithm is thereby replaced bya local confidence standard. In a step 104, the Viterbi algorithmsupplies a global confidence standard C at every point in time, thiscomprising individual, local confidence standards in cumulative form forthe detected statusses of the sequence of statusses W. In a step 105,the keyword is considered recognized in the continuous speech when thefollowing applies:

C(W,O)<T  (1),

whereby

C( ) denotes the global confidence standard, W denotes the keywordpresented as a sequence of statusses, O denotes the sequence of featureventures, O₁, T denotes a predetermined threshold.

Otherwise, the keyword is not recognized in the continuous speech.

Two possible realizations for a global confidence standard and arespectively corresponding local confidence standard are describedbelow. Other confidence standards are conceivable.

First Confidence Standard

The first global confidence standard is defined from the negativelogarithm of an a posteriori probability for the keyword as adependability criterion:

C ₁=−log P(W|O)  (2).

Bayes' rule is subsequently applied in conjunction with the followingassumptions: $\begin{matrix}{{{P(O)} = {\prod\limits_{t}\quad {P( O_{t} )}}},} & (8) \\{{{P(W)} = {\prod\limits_{t}\quad {P( s_{\psi {(t)}} )}}},} & (9) \\{P( {{O W )} = {\prod\limits_{t}\quad \lbrack {{P( {O_{t}{ s_{\psi {(t)}} ) \cdot a_{{\psi {({t - 1})}},{\psi {(t)}}}}} \rbrack} \cdot} }} } & (10)\end{matrix}$

The probability for a sequence of feature vectors P(O) is therebyexpressed as a multiplication of probabilities for individual featurevectors P(O_(t)). The probability for an entire word P(W) is calculatedin the same way in that the individual probabilities P(s_(ψ(t))) of eachindividual, selected status of an HMM are multiplied, whereby thefunction ψ(t) is an imaging of the feature vectors (i.e. of the time)onto the statusses of the keyword. The conditional probability P(O|W)corresponds to the ordinary probability of the HMM that can becalculated with the emission probabilities P(O_(t)|s_(ψ(t))) and thetransition probabilities a_(ψ(t−1).ψ(t)). The global confidence standardC₁ thus derives as: $\begin{matrix}{C_{1} = {\sum\limits_{t}\quad {{- {\log ( {\frac{P( {O_{t}{ s_{\psi {(t)}} ) \cdot {P( s_{\psi {(t)}} )}}} }{P( O_{t} )} \cdot a_{{\psi {({t - 1})}},{\psi {(t)}}}} )}} \cdot}}} & (11)\end{matrix}$

When the operation of the Viterbi algorithm is considered, then thedefinition of a local confidence standard c₁(O_(t)|s_(j)) that is usedwithin the search event of the Viterbi algorithm is recommendable:$\begin{matrix}{c_{1}( {{O_{t} s_{j} )} = {{- \log}{\frac{P( {O_{t}{ s_{j} ) \cdot {P( s_{j} )}}} }{P( O_{t} )} \cdot}}} } & (12)\end{matrix}$

The probability of the feature vector that appears in the denominator ofEquation (12) can be calculated in that all statusses of the HMM aretaken into consideration: $\begin{matrix}{{P( O_{t} )} = {\sum\limits_{k}\quad {P( {O_{t}{ s_{k} ) \cdot {P( s_{k} )}}} }}} & (13)\end{matrix}$

(also see Equation (6)).

The a priori probability P(S_(k)) of these statusses has been determinedin the preceding training. The local confidence standard c₁(O_(t)|s_(j))can thus be completely calculated.

A Second Confidence Standard

The definition of a second global confidence standard is composed of therelationship of the conditioned probabilities of a sequence O of featurevectors O_(t) under the condition of a sequence of statusses Widentifying the keyword, in the one instance, and, in another instance,under the model {overscore (W)}. The following derives: $\begin{matrix}{C_{2} = {{- \log}{\frac{P( {O W )} }{P( {O \overset{\_}{W} )} } \cdot}}} & (4)\end{matrix}$

{overscore (W)} thereby only represents a model that does not reallyexist but whose emission probability can be calculated. In contrast tothe definition of the first global confidence standard, this definitionleads to a symmetrical global confidence standard that exhibits asymmetry center at 0 when

P(O|W)=P(O|{overscore (W)})  (14)

is met. Analogous to the case for the first global confidence standard,the following equation derives by insertion of Equations (8), (9) and(10), taking the respectively inverse model {overscore(a_(ψ(t−1),ψ(t)))} _(and) {overscore (s_(ψ(t)))} into consideration:$\begin{matrix}{C_{2} = {\sum\limits_{t}\quad {{- \log}{\frac{P( {O_{t}{ s_{\psi {(t)}} ) \cdot a_{{\psi {({t - 1})}},{\psi {(t)}}}}} }{P( {O_{t}{ \overset{\_}{s_{\psi {(t)}}} ) \cdot \overset{\_}{a_{{\psi {({t - 1})}},{\psi {(t)}}}}}} } \cdot}}}} & (15)\end{matrix}$

A suitable local confidence standard c₂(O_(t)) that can be employed inthe search carried out by the Viterbi algorithm is defined as:$\begin{matrix}{ {c_{2}( {O_{t}{s_{j}}} } ) = {{- \log}{\frac{P(  {O_{t}{s_{j}}} ) }{P(  {O_{t}{\overset{\_}{s_{j}}}} ) } \cdot}}} & (16)\end{matrix}$

In this case, too, the local confidence standard c₂(O_(t)|s_(j)) iscalculatable since the denominator call be calculated in that allweighted emission probabilities except for P(O_(t)|s_(j)) itself can becalculated: $\begin{matrix}{P( {{O_{t} \overset{\_}{s_{j}} )} = {\sum\limits_{k \neq j}\quad {P( { {O_{t}{s_{k}}} ) \cdot {P( s_{k} )}} }}} } & (7)\end{matrix}$

also see Equation (7)).

The two definitions thus lead to a confidence standard that, given a lowvalue (a negative value in the case of the global confidence standardC₂), indicates a high dependability that a keyword has been correctlyrecognized.

It is indicated as an advantage of this calculatable confidence standardthat additional HMMs need not be trained nor is a dexterous manipulationof other affected parameters necessary. The confidence standards can becalculated upon employment of general phoneme HMMs.

The definition of confidence standard, as was shown above, can beoperated with a Viterbi search based on hidden Markov models. Eachindividual status s_(j) of the HMMs then emits not the negativelogarithm of a probability P(O_(t)|s_(j)) but a local confidencestandard c₁ or c₂ instead.

FIG. 2 shows a sketch that illustrates the determination of a confidencestandard.

In the upper diagram of FIG. 2, discrete times t₁, t₂, . . . are shownon the abscissa and the keyword SW characterized by a sequence ofstatusses ZS is shown on the ordinate. A continuous speech signal isshown over a time axis t in FIG. 2.

The continuous speech signal can contain a plurality of keywords, evendifferent keywords, whereby only one keyword is preferably contained atone point in time.

The continuous voice signal is sampled at discrete times, and theinformation present at the respective sampling time is stored in afeature vector O_(t). It is inventively assumed that a keyword can beginat each of these sampling times. A potential keyword whose paths canre-combine during the course of the Viterbi algorithm thus respectivelybegins at each of the times t1, t2 or t3. For simplification, onekeyword is assumed here, whereby a plurality of keywords require arespective method for each keyword to be recognized.

When, thus, the keyword begins at time t₁, then an imaging of thefeature vectors following the time t₁ is undertaken on the basis of thefeature vectors O_(t) acquired from the continuous speech. The best pathPF with respect to the accumulated confidence standard is respectivelydetermined. A confidence standard C derives for each time t. The valueof the confidence standard provides information as to whether thekeyword was contained in the continuous speech or not and ended at timet.

By way of example, paths are entered in FIG. 2 that begin at times t₁,t₂ and t₃ and—at times t₄, t₅ and t₆—lead to the global confidencestandards C^(I), C^(II) and C^(III). The global confidence standardsbelonging to C^(I) and C^(II) correspond to the possible keywordbeginning in t1, whereas the global confidence standard C^(III) is bestachieved by a path that begins in t₂.

Let it thereby be noted that a global confidence standard C is observedat every time t, whereby an corresponding starting time is determined byapplication of the Viterbi algorithm.

When the continuous speech contains something completely different fromthe keyword, then the confidence standard is correspondingly poor; norecognition occurs. According to the functioning of the Viterbialgorithm, the length for various paths for determining the globalconfidence standard is also not equal, indicated in that the globalconfidence standard C^(I) is formed from the local confidence standardsof four statusses, whereas the global confidence standards C^(II) andC^(III) are composed of the local confidence standards of fivestatusses. The duration of the corresponding keywords thus derives as4Δt and as 5Δt.

FIG. 3 illustrates this relationship. The global confidence standardsC^(I), C^(II) and C^(III) determined from FIG. 2 are entered by way ofexample at the ordinate in FIG. 3. The abscissa again identifies thetime t.

A separate global confidence standard C respectively derives for eachtime t.

A minimum MIN of the global confidence standard C is preferably definedand it is thus assumed that the keyword in the continuous speech ispresent in this minimum MIN.

This is of significance insofar as the threshold T for the globalconfidence standard is already downwardly transgressed, i.e. the keywordis recognized, at a time t_(a). In view of the variable dynamicadaptation (different time durations for determining the globalconfidence standard), however, and as shown here in FIG. 3 by way ofexample, this keyword can be recognized “even better” at immediatelyimpending times t_(a+1). In order to determine when the keyword isoptimally recognized, the minimum MIN with the corresponding timet_(MIN) is identified. Proceeding from this time t_(MIN), the startingtime of the keyword in the continuous voice signal is determined withbacktracking (see Rabiner et al). The start of the spoken keyword in thecontinuous voice signal is thus determined.

Let it thereby be noted that such a determination of the minimum can beimplemented for each keyword; however, no other keyword can berecognized for the duration of a keyword. When a plurality ofoverlapping keywords are recognized in parallel from the continuousspeech, then the keyword whose confidence standard reflects the highestdependability compared to the other keywords is preferably the rightone.

Although various minor changes and modifications might be proposed bythose skilled in the art, it will be understood that our wish is toinclude within the claims of the patent warranted hereon all suchchanges and modifications as reasonably come within our contribution tothe art.

What is claimed is:
 1. A method for recognizing a keyword in spokenlanguage, comprising the steps of: representing the keyword by asequence of statuses W of hidden Markov models; sampling the spokenlanguage with a predetermined rate and producing a feature vector O_(t)at every sampling time t for a voice signal from the spoken languagebelonging to the sampling time t; imaging a sequence O of featurevectors O_(t) onto the sequence of statuses with a Viterbi algorithm,whereby a local confidence standard is calculated on the basis of anemission standard at a status; with the Viterbi algorithm supplying aglobal confidence standard; recognizing the keyword in the spokenlanguage when the following applies C(W,O)<T, where C( ) denotes theconfidence standard, W denotes the keyword, presented as a sequence ofstatuses, O denotes the sequence of feature vectors O_(t), and T denotesa predetermined threshold; and otherwise, the keyword in the spokenlanguage is not recognized.
 2. The method according to claim 1 whereinthe emission standard is a negative logarithm of an emissionprobability.
 3. The method according to claim 1 wherein a new path in afirst status of the sequence of W statuses begins at every samplingtime.
 4. The method according to claim 1 wherein the Viterbi algorithmsupplies a global confidence standard at every sampling time.
 5. Themethod according to claim 1 wherein the confidence standard C is definedby C=−log P(W|O) and the corresponding local confidence standard c isdefined by$c = {{- \log}\frac{P( {O_{t}{ s_{j} ) \cdot {P( s_{j} )}}} }{P( O_{t} )}}$

where S_(j) denotes a status of the sequence of statuses.
 6. The methodaccording to claims 1 wherein the confidence standard C is defined by$C = {{- \log}\frac{P( {O W )} }{P( {O \overset{\_}{W} )} }}$

and the corresponding local confidence standard is defined by$c = {{- \log}\frac{P( {O_{t} s_{j} )} }{P( {O_{t} \overset{\_}{s_{j}} )} }}$

where {overscore (W)} denotes not the keyword W, and {overscore (s_(j))}denotes not the status s_(j).
 7. The method according to claim 1 whereinthe global confidence standard is determined for a predetermined timeduration, and conclusions are drawn about a starting time of the keywordfrom a minimum of the global confidence standard.
 8. The methodaccording to claim 7 wherein the minimum lies below a predeterminedthreshold.
 9. The method of claim 1 wherein for recognizing a pluralityof keywords, the method is employed in parallel for each keyword,whereby the keyword with a better confidence standard is recognized assoon as a plurality of prescribed thresholds are downwardlytransgressed.
 10. The method according to claim 9 wherein no furtherkeyword is recognized for the time span in which a keyword that has beenrecognized was contained in the spoken language.
 11. A method forrecognizing a keyword in spoken language, comprising the steps of:representing the keyword by a sequence of statuses W of hidden Markovmodels; sampling the spoken language with a predetermined rate andproducing a feature vector O_(t) at every sampling time t for a voicesignal from the spoken language belonging to the sampling time t;imaging a sequence O of feature vectors O_(t) onto the sequence ofstatuses with a Viterbi algorithm, whereby a local confidence standardis calculated on the basis of an emission standard at a status; with theViterbi algorithm supplying a global confidence standard; andrecognizing the keyword in the spoken language when the followingapplies: C(W,O)<T, where C( ) denotes the confidence standard, W denotesthe keyword, presented as a sequence of statuses, O denotes the sequenceof feature vectors O_(t), and T denotes a predetermined threshold.